A Model for Folding and Aggregation in RNA Secondary Structures 
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We study the statistical mechanics of RNA secondary structures designed to have an attraction 
between two different types of structures as a model system for heteropolymer aggregation. The 
competition between the branching entropy of the secondary structure and the energy gained by 
pairing drives the RNA to undergo a 1 temperature independent' second order phase transition from 
a molten to an aggregated phase. The aggregated phase thus obtained has a macroscopically large 
number of contacts between different RNAs. The partition function scaling exponent for this phase 
is 8 ~ 1/2 and the crossover exponent of the phase transition is v m 5/3. The relevance of these 
calculations to the aggregation of biological molecules is discussed. 

PACS numbers: 87.15.Aa, 87.15.Cc, 64.60.Fr, 87.15Nn 
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RNA secondary structures are an excellent model sys- 
tem to study folding phenomena in heteropolymers. Un- 
like in the protein folding problem where a large num- 
ber of different monomers needs to be taken into ac- 
count to understand folding 1], an RNA has just four 
bases A, U, C, and G. The interactions between these 
bases are simpler than in the protein folding problem 
due to the separable energy scales of the secondary 
and the tertiary structure. These features make RNA 
secondary structures a both analytically and numeri- 
cally amenable model for rigorously studying various 
generic thermodynamic properties of heteropolymer fold- 
ing 0HSSIS0 

Quite a lot is known about the folding thermodynam- 
ics of single RNA molecules. At low temperatures, where 
monomer specific binding energies and sequence hetero- 
geneity are important, the resulting (frozen) phase is 
glassy 01 At high temperatures, thermal fluctuations 
lead to a denatured phase, where the backbone is ran- 
domly coiled (without any binding) like a self-avoiding 
random walk. At intermediate temperatures, where an 
effective attraction between short segments is impor- 
tant, the molecules are expected to be in the so called 
molten phase (3, H| • In this molten phase many differ- 
ent secondary structures all having comparable energies 
(within O(fcgT)) coexist. If the tendency of biological 
sequences to be designed to fold into a specific, func- 
tional structure is taken into account, the native phase 
emerges 0, 0. Many important questions have been 
raised with regard to these phases, e.g. their stability, 
characteristics, and the properties of the phase transi- 
tions between them in the context of both protein and 

RNA folding 0, H H S, In this Letter we 
shall begin to understand another important aspect of 
heteropolymer folding, namely the competiton between 
the individual folding of the molecules and aggregation 
of several molecules, using the RNA secondary structure 
formulation. 

In the context of protein folding, the competition be- 
tween individual folding and misfolding associated with 
aggregation is a very important phenomenon. The fail- 
ure of protein molecules to fold correctly and the asso- 
ciated formation of alternative structures stabilized by 



aggregation is associated with various diseases such as 
Alzheimer's, Mad Cow, and Parkinson's [S 0. Thus, 
this phenomenon has been studied with the tools avail- 
able for the protein folding problem in various con- 
texts [2, ll3 • But also in the realm of RNA folding the 
competition between individual and aggregated struc- 
tures plays an important role, e.g., in the growing field of 
riboswitches [llj. In these riboswitches, the aggregation 
of two RNA molecules through base-pairing in competi- 
tion with the base pairing of the individual molecules is 
used to regulate the expression of genes in dependence 
on the concentrations of the RNAs involved. Even the 
local structure of double stranded DNA in the repeat 
regions of the genes involved in triplet repeat deseases 
(Huntingtons, fragile X, etc. [l^) is an example of an 
aggregated structure (the double stranded DNA) com- 
peting with the multitude of secondary structures the 
single strands of this DNA can form by themselves since 
their repeat units of, e.g., CAG and CTG in Hungting- 
ton's desease, allow self-pairing as well. Here, we ap- 
proach the phenomena associated with competition be- 
tween intra-molecular structure and aggregation by con- 
sidering a toy model to study the phase transition of an 
RNA secondary structure from the molten to an aggre- 
gated phase. While our model is literally applicable to 
the above mentioned triplet repeat desease genes, we see 
it more broadly as a basic model for studying the compe- 
tition between intra-molecular structure and aggregation 
into which later aspects of the other scenarios discussed 
above such as native states and simultaneous aggregation 
of several molecules can be incorporated. In studying 
our model, our focus is on the thermodynamic properties 
of the system. Thus, we solve the model exactly in the 
thermodynamic limit and calculate the critical exponents 
relevant to the phase transition. 

RNA is a biopolymer with four different monomers A, 
U, C and G in its sequence. The Watson-Crick pairs A-U 
and C-G are energetically the most favorable pairs while 
G-U is marginally stable and the other combinations are 
prohibited. By an RNA secondary structure, we mean 
a collection of binding pairs (i,j) with 1 < i < j < N, 
where N is the number of bases in the sequence. Any 
two pairs (ii,ji) and («2,j2) are either nested, i.e. i\ < 
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FIG. 1: Abstract representations of RNA secondary struc- 
tures (from '4j). (a) Helix representation (b) Non-crossing 
Arch diagram. Here, the solid line corresponds to the back- 
bone of the RNA. The dashed arches correspond to the base 
pairs. The absence of pseudo-knots implies that the arches 
never cross, (c) Mountain Representation. Here, as we go 
along the backbone of the RNA from base 1 to N (repre- 
sented by the base line), we go one step up for the beginning 
of a pair, one step down for the closing of a pair and a hori- 
zontal step for no pairing. Such a mountain never crosses the 
baseline and always returns to the baseline at the end. 

*2 < h < ji or independent i.e. ii < j\ < i 2 < j 2 - 
The above restriction means we are not allowing pseudo- 
knots, which are generally energetically not as favorable 
[l3| . Such a secondary structure can be represented by a 
helix diagram, non-crossing arch diagram or a mountain 
representation as shown in Fig. 1. 

Let the free energy associated with the pairing of bases 
i and j in an RNA be ey. This free energy has contri- 
butions from the gain in the energy due to binding and 
the associated configurational entropy loss. In addition 
to these, in principle there are also entropic and/or en- 
ergetic effects due to loop formation, stacking, etc. Even 
though the accurate parameters as determined by the ex- 
periments are essential to calculate the exact secondary 
structure, such microscopic details as well as the ex- 
act values of the energies e^- do not affect the asymp- 
totic properties of the phases and the critical exponents. 
Hence, we ignore them in our model calculations. 

To understand the phase transition from the molten 
to the aggregated phase, we first define the aggregated 
phase as an ensemble of RNA secondary structures in 
which a macroscopically large number of contacts occur 
between two different RNAs. We consider a dual RNA 
biomolecule system consisting of two types of RNA in a 
solution. We refer to them as RNA-1 and RNA-2. In- 
dividually, RNA-1 and RNA-2 are in the molten phase. 
However, when they are together in a solution, also base 
pairings between bases from different molecules are pos- 
sible. We study the phases of this dual RNA system, as 
the bias strength is varied. 

To do so, we assume a simple pairing energy model 
with the free energy of pairing between bases i and j 
defined as: 

( ei if i,j E RNA-1 
Cij = \ e 2 if i,j e RNA-2 (1) 
I e 3 if i S RNA-1 j £ RNA-2 or vice- versa 

Here, the intra RNA base pairing energies ei and e 2 could 



be of comparable magnitude in a realistic RNA molecule. 
The inter RNA base pairing energy, or the bias, £3 is 
the parameter which can in principle be controlled by 
sequence mutation. Note, that neglecting sequence het- 
erogeneity in this kind of models was established as a 
useful approximation at not too low temperatures in a 
similar context [f| (see also the discussion at the end of 
this letter). 

Denote the Boltzmann factors corresponding to the 
pairing energies by q±, q 2 and 53 respectively. We show 
that this simple model predicts a molten to an aggre- 
gated phase transition, as we tune the parameter q^ . We 
do so by exploiting the recursive relation [T3. Il5| 

j-i 

Zij = Z it j-i + Z itk ^ 1 e^ ik ' T Zk+i,j-i (2) 

k—i 

for the partition function Z^ for a sequence of bases from 
i to j, which can be evaluated in 0(N 3 ) time starting 
from the initial conditions Z^ = Zi^\ = 1. 

To keep the analytical calculations simple, we assume 
each RNA to be of equal length, containing N — 1 bases 
fl6j | . We now consider the joint folding of these two RNAs 
and denote its partition function by Zd(N; qi,q2,qa)- As 
explained before, the free energy of pairing for the bases 
belonging to a given RNA has contributions from the 
energy gain due to the pairing and the entropy loss as- 
sociated with the loop formation. This holds true even 
for pairing across the bases belonging to different RNAs. 
But when the first pairing between the bases belonging to 
different RNAs occur, there is an additional entropic loss 
due to the breakdown of translational invariance sym- 
metry. Thereafter, only the free energy £3 plays a role in 
the inter RNA base pairing. In the thermodynamic limit, 
this additional entropic loss has no effect on the phase of 
the system, but it is the energetics of pairing that drives 
the phase transition. Hence, we ignore this additional 
entropic term. This essentially reduces the problem to 
the folding of a single sequence with 2N — 2 bases. The 
aggregated secondary structure can now be interpreted 
as having a macroscopically large number of contacts be- 
tween the two halves of the concatenated RNA. 

Let us first consider two special cases. Setting q = 
qi = qi = (73 corresponds to the well known molten phase 
of the RNA secondary structure, whose partition func- 
tion can be calculated exactly in the asymptotic form 
Z d (N;q,q,q) = Z (2N;q) = A(q){2N)- e z c (q) 2N with 
the characteristic scaling exponent 9 = 3/2 J5j. This 
exponent is characteristic in the sense that it is insen- 
sitive to various microscopic details of the RNA sec- 
ondary structure such as the cost of a hairpin loop, 
weak sequence heterogeneity, etc. The other simple 
case is (73 = 0. This case describes two RNAs in the 
molten phase which do not know of each other's pres- 
ence. The partition function of such a dual RNA is then 
just the product of individual partition functions, i.e. 
Z d (N;q 1 ,q 2 ,0) = Z (N,q 1 )Z (N,q 2 ). Hence the scaling 
exponent is 9 = 3. 
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FIG. 2: (color online) The behavior of the partition function 
Z d (z;qi =4,32 = 9,g3). For qs = Q3c = 6, we observe a 
square root behavior. For qs, > qz c , we see an inverse square 
root behavior. The inset shows the resulting phase diagram. 

We now want to understand the case of general q\ 1 
q 2 and q 3 . To this end we calculate the partition func- 
tion of the dual RNA as follows. Let the base pairings 
within a given RNA be called primary and those across 
different RNAs be called secondary. Any given secondary 
structure thus obtained has a series of secondary pairings 
(h,ji),. ■ ■ ,(ik,3k) such that 1 < i x < ... < i k < N-l and 
1 < ji < ... < jk < N — 1. Note that we have labeled 
the RNA-1 by i and the RNA- 2 by j indices. The bub- 
bles thus formed between any two consecutive secondary 
pairings are allowed to have only the primary pairings. 
If all the secondary structure configurations are enumer- 
ated according to the number of the inter-RNA (or the 
secondary) contacts fc, then the total partition function 
of this dual RNA system, in the z-transform representa- 
tion can be written as: 

oo 

Z d (z; qi,ta,q a ) = £ q k 3 Z ( Z] Ql) k+1 * Z {z; q 2 ) k+1 (3) 

fc=0 

= / dz' Z (z'; q^Zojz/z'; q 2 ) ^ 
J z' i_ q 3 Zo(z';qi)Zo(z/z';q 2 ) 

where Zd(z; qi,q 2 , 93) and Zq(z; q) are the z-transforms of 
Zd(N\qi,q2,qz) and Zo(N;q) respectively. The symbol 
* indicates the convolution in z-spacc defined as / * g — 
§ lP~f( z ')9( z / z ')- Eq.(4) is obtained by summing up the 
geometric series in Eq.(3). The convolution integration 
can be done numerically to obtain the singularities of Zd 
and hence, the asymptotic behavior of Zd(N; qi, q 2 , §3)- 
The results are shown in Fig. 2. For q 3 — q 3c = ^/qiq 2 , 
we find a square root singularity and hence 8 = 3/2 
[l7|. the characteristic exponent of the molten phase. 
For q 3 > <73 C , Zd has an inverse square root singular- 
ity, indicating a new phase. We interpret the new phase 
with the partition function scaling exponent 6 ~ 1/2 as 
the aggregated phase. We claim that for all q 3 < q 3c , 
the dual RNA system is just the phase corresponding to 



173 = in the asymptotic limit, hence 9 — 3. This claim 
is verified by numerical calculations of the exact parti- 
tion function for finite length and the calculation of an 
asymptotic macroscopic quantity (the order parameter) 
to be defined below. The resulting simple phase diagram 
is shown in the inset of Fig. 2. 

In order to verify that the phase transition indeed hap- 
pens at q 3c = y/q\q 2 , we calculate the order parameter of 
the phase transition. Here, the order parameter Q is de- 
fined as the fraction of secondary pairings in a secondary 
structure, an important structural property of the aggre- 
gate. For arbitrary q 3 the order parameter can be cal- 
culated exactly from Q = —lim^^^dhiZd/dlnq^/N. 
The inset of Fig. 3 clearly shows Q = for q 3 < q 3c 
and continuously increasing with q 3 thereafter saturat- 
ing to Q = 1 for qz/qzc ^S> 1. From this behavior of the 
order parameter we can conclude that the phase transi- 
tion indeed occurs at q 3c = y/q\q 2 and that the phase 
transition is of second order. Physically, we can under- 
stand the behavior of the order parameter by using the 
mountain representation of RNA (see Fig. lc). Between 
any two consecutive secondary pairings, the contribution 
of primary pairs to the height of the mountain is zero. 
Hence, the total number of secondary pairings is equal to 
the height (h) of the mountain at its midpoint. Using the 
random walk analogy 0,[I3, we find that (h) - 0(N 1 / 2 ), 
hence Q ~ 0(A -1 / 2 ). For q 3 < q 3c , the secondary pair- 
ings are even less likely, and hence in the thermodynamic 
limit Q = for q 3 < q 3c , consistent with what we have 
obtained by exact expression. 

To further verify our claims about the phase for q 3 < 
q 3c and to calculate the scaling exponents correspond- 
ing to the second order phase transition, we iterated the 
recursion relation (Eq.(l)) to calculate the exact parti- 
tion function for RNA of finite length N. The results 
of the numerical calculations are in complete agreement 
with the phase diagram of Fig. 2 (inset) when extrap- 
olated to the thermodynamic limit, thus verifying our 
claim |l9|. Next we calculate the free energy per length 
f(qi,q 2 ,qs) = — \sxZd{N)/N, taking into account the fi- 
nite size effects. We assume the usual scaling function 
for the order parameter Q(N) = N-^ 2 g[{q 3 - q^N 1 /"] 
close to the critical point. Fig. 3 shows the result of scal- 
ing plot, with the best fit value for the crossover critical 
exponent v w 5/3. 

This model has some similarities with Go-like model 
studied by Bundschuh and Hwa [f| which shows a molten- 
native transition. The physics behind the phase transi- 
tion in their model as well as our model is the same, 
i.e., the competition between the energetic gain of the 
secondary contacts (or native contacts of Go-like model) 
and the branching entropy. But, contrary to the native 
phase where the ground state is unique, the aggregated 
phase has degenerate ground states. On the other hand, 
both these models can 'melt' from their (aggregated or 
native) ground state to any of the molten, glassy or de- 
natured phase, depending on the temperature and the 
strength of the bias. The differences in the behavior of 
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FIG. 3: (color online) Scaling plot for the order parameter. 
Inset shows the order parameter of the phase transition. In 
both the plots q\ = 4 and qi = 9, hence q^ c = 6. 



these models arises from the fact that for the Go-like 
model the bias is site specific where as for the model 
we have presented, the bias is towards a macroscopically 
large number of sites. 

We would like to emphasize two simplistic consider- 
ations of our model. Firstly, choosing a molten phase 
to start with, in which the sequence heterogeneity is 
unimportant. It is important to note that the bases as 
we call them here are not necessarily single bases, but 
short segments of the sequence (such as CAG) whose ef- 
fective interaction with any other segment is the same 
P|. Even if we do have weak sequence heterogeneity, 
we do not expect that such microscopic details alter the 
thermodynamic results presented here based on previous 



work 0, 0, [l^. However, at low enough temperatures 
where such a homogeneous approximation is no longer 
valid, it should be interesting to consider the role of a 
suitably defined bias in the glassy phase. Secondly, we 
considered only two RNA molecules for aggregation. In 
the case of multiple RNAs participating in the folding, 
the ground state would depend on how the different types 
of RNAs are aligned to fold. In fact, these ground states 
could be topologically different from the ground state of 
our two RNA model. Hence, the values of the critical 
exponents for the transition might change, though the 
qualitative physics of aggregation, such as the critical 
inter-molecular base pairing energy at which the transi- 
tion takes place would remain the same. 

In summary, we have presented a simple model for het- 
cropolymer folding using the RNA secondary structure 
formulation, which shows a second order phase transi- 
tion from an independently molten to an aggregated phase. 
The behavior at criticality turns out to be the molten 
phase for the concatenated molecule. The transition is 
completely driven by the energetics of pairing and is tem- 
perature independent. Proteins are known to undergo a 
folding transition from the native to an aggregated phase 
instead of from a molten to an aggregated phase It 
should be interesting to see if this study can be extended 
to understand the thermodynamics of such a phase tran- 
sition. It should also be interesting to study the role of 
kinetics of RNA folding in this phase transition. 
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